-
Notifications
You must be signed in to change notification settings - Fork 15.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add C# deterministic serialization #13160
add C# deterministic serialization #13160
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
@@ -452,7 +452,17 @@ public void WriteTo(CodedOutputStream output, Codec codec) | |||
WriteContext.Initialize(output, out WriteContext ctx); | |||
try | |||
{ | |||
WriteTo(ref ctx, codec); | |||
if (output.IsSerializationDeterministic()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I unit-test this ? Is this for old proto2 format ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean by that. Is what for proto2 format?
@@ -691,6 +691,54 @@ public void OneofSerialization_DefaultValue() | |||
}); | |||
} | |||
|
|||
[Test] | |||
public void MapSerializationDeterministicTrue_ThenBytesIdentical() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add more tests for different types of keys, once the general implementation approach is approved.
I'll get to this when I can - it may not be this week though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Various suggestions, but I'm happy with the overall approach.
/// maps are sorted on the lexicographical order of the UTF8 encoded keys.</description></item> | ||
/// </list> | ||
/// </remarks> | ||
public void UseDeterministicSerialization() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should just be a property:
public bool DeterministicSerialization { get; set; }
I'm not hugely keen on the stream being mutable like this, but it appears to be the design in other languages.
(Naming could be IsDeterministic
or just Deterministic
... I'm not hugely bothered on that front.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored to property: Deterministic
@@ -452,7 +452,17 @@ public void WriteTo(CodedOutputStream output, Codec codec) | |||
WriteContext.Initialize(output, out WriteContext ctx); | |||
try | |||
{ | |||
WriteTo(ref ctx, codec); | |||
if (output.IsSerializationDeterministic()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean by that. Is what for proto2 format?
/// change in the future) | ||
/// <list type="bullet"> | ||
/// <item><description>Sort map entries by keys in lexicographical order or numerical order. Note: For string | ||
/// keys, the order is based on comparing the Unicode value of each character in the strings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each Unicode character, or each UTF-16 code unit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to UTF-16 code unit
} | ||
else | ||
{ | ||
WriteTo(ref ctx, codec, list); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better not to duplicate the WriteTo
call here. Something like:
IList<KeyValuePair<TKey, TValue>> listToWrite = list;
if (output.IsDeterministic)
{
listToWrite = new List<KeyValuePair<TKey, TValue>>(list);
listToWrite.Sort(...);
}
WriteTo(ref ctx, codec, listToWrite);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored to call GetSortedListCopy
@@ -469,7 +479,24 @@ public void WriteTo(CodedOutputStream output, Codec codec) | |||
[SecuritySafeCritical] | |||
public void WriteTo(ref WriteContext ctx, Codec codec) | |||
{ | |||
foreach (var entry in list) | |||
if (ctx.state.CodedOutputStream is not null && ctx.state.CodedOutputStream.IsSerializationDeterministic()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can just be:
if (ctx.state.CodedOutputStream?.IsDeterministic == true)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored to if (ctx.state.CodedOutputStream?.Deterministic ?? false)
// We can't sort the list in place, as that would invalidate the linked list. | ||
// Instead, we create a new list, sort that, and then write it out. | ||
var sorted = new List<KeyValuePair<TKey, TValue>>(list); | ||
sorted.Sort((pair1, pair2) => Comparer<TKey>.Default.Compare(pair1.Key, pair2.Key)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the repetition here, I'd probably extract the logic into a method of GetSortedListCopy
or similar. Otherwise if we tweaked how the comparer is set up, it could end up inconsistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored to GetSortedListCopy
if (output.IsSerializationDeterministic()) | ||
{ | ||
var sorted = new List<KeyValuePair<TKey, TValue>>(list); | ||
sorted.Sort((pair1, pair2) => Comparer<TKey>.Default.Compare(pair1.Key, pair2.Key)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Comparer.Default begs the question as to whether the default string comparison is ordinal or not. I believe it is, but I'm not entirely sure.
I'd at least add a comment here - and possibly also note that map keys are guaranteed to be bools, integers or strings; never bytestrings or messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is NOT ordinal by default, explicitly using StringComparer.Ordinal now
Thanks for the additional commit - I'll review on Monday. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, this looks good to me. Will rerun tests.
@@ -160,6 +160,33 @@ public long Position | |||
} | |||
} | |||
|
|||
/// <summary> | |||
/// Configures serialization to be deterministic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps "Configures whether or not serialization is deterministic." (It's a property, but the current text sounds sounds like it's a method.)
/// The deterministic serialization guarantees that for a given binary, equal messages (defined by the | ||
/// equals methods in protos) will always be serialized to the same bytes. This implies: | ||
/// <list type="bullet"> | ||
/// <item><description>Repeated serialization of a message will return the same bytes</description></item> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: add period
/// Configures serialization to be deterministic. | ||
/// </summary> | ||
/// <remarks> | ||
/// The deterministic serialization guarantees that for a given binary, equal messages (defined by the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: "The deterministic serialization" => "Deterministic serialization"
The tests appear to be failing in CI - I'll take a look. |
#12881 @jskeet review please :-) as this is my first contrib to protobuf, any design / performance feedback is very sought after Closes #13160 COPYBARA_INTEGRATE_REVIEW=#13160 from fmg-lydonchandra:feature/12881_cs_serialization_deterministic ab7e01b FUTURE_COPYBARA_INTEGRATE_REVIEW=#13160 from fmg-lydonchandra:feature/12881_cs_serialization_deterministic ab7e01b PiperOrigin-RevId: 567635727
#12881 @jskeet review please :-) as this is my first contrib to protobuf, any design / performance feedback is very sought after Closes #13160 COPYBARA_INTEGRATE_REVIEW=#13160 from fmg-lydonchandra:feature/12881_cs_serialization_deterministic ab7e01b FUTURE_COPYBARA_INTEGRATE_REVIEW=#13160 from fmg-lydonchandra:feature/12881_cs_serialization_deterministic ab7e01b PiperOrigin-RevId: 567635727
#12881
@jskeet review please :-) as this is my first contrib to protobuf, any design / performance feedback is very sought after